tele9752wikiaorg-20200213-history
Advanced topic 4, 2012
Opportunistic Flow-Level Latency Estimation Using Consistent NetFlow The content of this wiki page was adapted from the paper above. All credit goes to original authors. (Topic 4 from Advanced Topics 2012) 'INTRODUCTION' After reading this paper the reader should be able to understand the following concepts: *What is latency and why is it important? *What are flow records and how are they useful? *What is consistent NetFlow? *What is Opportunistic Latency Estimation? *What are the experimental results of this study? What is latency and why is it important? Latency is the delay in data propagation introduced by links and network devices. Bandwidth measures how much data can be shipped in a second, while latency measures how many milliseconds it takes a single bit or packet of data to get to its destination. In modern networks bandwidth bottlenecks can be resolved by the addition of extra links. Latency problems are much harder to resolve. Causes of network latency are: *The speed of light - 3 x 10^8 meters per second squared - is the lower limit on propagation of all electromagnetic signals. In data networks the fastest that can be achieved is about half this. This latency is proportional to the length of the links between network devices *Switching and processing latency is the delay introduced by devices in the data path, and ranges from 1 millsecond for a fast switch up to 10 milliseconds for a slower firewall. Analog modems would introduce up to 50 milliseconds latency at each end of a link. In general the more CPU processing (firewalls, load balancers) the more latency is introduced. Less latency is introduced by Layer 2 switching than layer 3 routing The combined effects of link and device latency result in the latency perceived by the end user. Typical latencies seen on a broadband packet network are: #Within Sydney 10 milliseconds #Within Australia - 30 - 100 milliseconds #Australia to the US - 200 milliseconds What are flow records and how are they useful? Routers take a packet from the ingress port, look up a table to determine where to send it, swap out the destination mac address and forward the packet out the egress port indicated by the routing table. NetFlow was originally a routing technology developed by Cisco around 1990. Prior to NetFlow Cisco routers sent all packets to the router CPU to determine the next hop. NetFlow cached the results of the first lookup in a cache table. Future lookups then matched the destination against the cache reducing the table lookup time. The NetFlow cache also contained the source IP, source port and destination port so that if the packet had some rules applied, for example ACL filtering, and a flow was created matching these parameters, further packets would not require a CPU lookup if they matched the cached flow. When new hardware was introduced that offloaded routing decisions to specialised chips on the router, the NetFlow cache was no longer required for routing, but the records were still in the router. It was discovered that these records contained useful information about the network and avoided the need to implement more complex RMON strategies to get Layer 3 information from network devices for network management purposes. NetFlow information may contain: *Flow usage counters *Start time and stop time of the flow *Interfaces used *QoS flags *IP addresses *Application ports *Routing information NetFlow has been through multiple iterations and is now up to Version 9 which uses templates to determine the format of records which contain up to 87 fields. Due to the size and volume of NetFlow records usually only a sample of the records is exported to a collector which stores the data for analysis. Vendors other than Cisco use similar but incompatible flow recording mechanisms. A standardisation effort is underway at the IETF for the method of exporting flow records (IPFIX) and the method of sampling flow records (PSAMP). IPFIX PSAMP What is consistent NetFlow? Given that we have collected all these flow records from multiple routers across the network, the problem that arises is that if a flow is an entity independent of the router, how do we match flow records from multiple routers to a single flow? This problem is more intractable than it first seems. Problems associated with matching flows across routers include *Time synchronisation of routers may not be accurate so time stamps don't match *Packet loss *Cache expiry due to load *Different sampling across routers - may be random In 2000 to 2002 trajectory sampling was suggested which proposed sending a packet with a special hash payload across the network that could easily be identified. By measuring the arrival time of this packet at each hop, the network latency could be inferred. In 2009 this idea was extended to IETF PSAMP in the following way: *Sample packets at each link *Pseudo random sampling (e.g., 1-out-of-100) *Compute a hash over the invariant fields (same on each hop) of the packet *Packet is selected for reporting (export) if the hash falls within a given range If all routers use the same hash, same input fields and same selection range, the flows that are exported will match between routers, with the result being consistent selection. What is Opportunistic Latency Estimation? Invariant fields in a flow include source and destination IP and source and destination port. These fields define the flow. In hash-based flow sampling the hash is computed over these fields. Including the packet payload in the hash function produces a hash-based packet sample. Packet sampling has some advantages over flow sampling: *Flow sampling can be bursty and uneven due to CPU processing cost associated with processing the flow, while packet sampling smoothes the selection process *Packet sampling is more compatible with traditional approaches to network management tasks which often require analysis of individual packets *Packet sampling preferentially selects large flows which are more often the subject of measurement than small flows Only completed flows have time stamps, one at the beginning of the first packet in the flow and one at the end of the last packet. Some method is needed to estimate the latency of packets within the flow to facilitate packet sampling. Opportunistic estimation makes use of the statistics of smaller flows that occur at the same time as the larger flow being investigated to estimate the start and finish times for individual packets within the flow. Some examples: *the packet delay of a flow of a single packet is the same as the flow delay *the packet delay of a flow of only two packets can be estimated as the delay of the flow divided by two - this is called the end-point estimator *for flows consisting of large numbers of packets, the delay of each packet can be estimated by correlating with the time stamps of smaller flows that occur at the same time as the larger flow - this is called the multiflow estimator. '' What are the experimental results of this study? In this paper, applying a threshold to the number of packets and using the end-point estimator for smaller flows and the multiflow estimator for larger flows is proposed as a heuristic - this is called the ''hybrid estimator. In addition to using intermediate flow delays to estimate packet delays, linear interpolation is used to construct and estimate for delay times for any packet in the flow under study. In this study the interpolated delays are estimated and compared to the actual packet delays to estimate the accuracy of the heuristic. 'RELATED WORK' Compared with existed technique, there are two mainly improvement: 1.It could provide per-flow latency measurement, however onther techniques could only provide per-hope lantancy statistic. 2.It could achieve passive measurement which means no active probes. This technique borrowed some techniques from other domains: *Consistent hashing from trajectory sampling proposed by Duffield in order to ensuring that consistent streams of packets are observed at two routers. *Directly comparing trajectory samples of the same packet observed at different points in order to measure loss and delay passively. *Packet Doppler is a technique to observe how packets that arrived at a router ina window interval get distributed in a discrete time series ofthe same window interval at the other router. * *Lossy difference aggregator(LDA) which can be implemented in routers at high speedsfor measuring various latency characteristics with microsecond precision. 'FLOW COLLECTION ARCHITECTURE' 'A. Current NetFlow Architecture' 'B. Prerequisites' 'C. Consistent NetFlow' *Approach to Measure Dalay Delay characteristics of the forwarding path between two interfaces within a router, or a link across routers which means from the egress interface of one router to the ingress of the other will be measured. The sum of both of these delays will also be measured since NetFlow is turned on in the ingress direction. For instance, the delay between NetFlow instances A and B in the above figure will be measured. Forwrding path latencis in R1 from input port A to output port connected to R3, and the link delay between R1 and R3 will be essentially measured. We assume that the timestamps of the first and last packets for the set of sampled flows have been stored in the NetFlow instances A and B. The key point is that if both A and B sample the same first and last packets for some flow recorded by both (in their set of sampled flows), then we can compute the delay bserved by these packets from the timestamps recorded at both A and B. *Problem Both A and B sample packets are completely independently and the number of commonly recorded flows is small. and thus the first and last packets may be different. As a result, the timestamps recorded at both ends are not consistent and hence the delays which are calculated according to these data may be wrong. *Solution In order to address this poblem, CNF(Consistent NetFlow) is proposed, which means NetFlow equiped with the additional modification that routers use hash-based sampling. Consequently, in CNF, the same packet set will be selectd independently an the timestamps are consisitent as well. *Sampling Mechanisms #flow sampling32 : according to the size of the selection range divided by the size of the set of of possible hash values, the average sampling rate can be determined.An input that uses only invariant fields from the flow key would result in selecton of all packets in a subset of flows. #packet sampling: Including hash input from the packet payload as well would result in selection that looks statistically similar to uniform random sampling, except that packets are selected consistently between different routers. 'D. Flow Correlation' A centralized flow collector will collect the flow records transmitted by the NetFlow process on each routers.But flow records obtained at any downstream NetFlow instance may contain flows that would have potentially taken different paths at the upstream router.Thus, some measures should be taken to obtain the properties of the path between a pair of NetFlow instances. *First Step In order to obtain the intersection set of the flow records obtained from the endpoints.Through the flow keys present in the flow records, the base set of flow records can be easily obtained. *Second Step After sorting records at each instance using the start timestamp, pairwise association can be performed since each flow records has a one-to-one correspondence. *Problems In practice, the theory results can not be achieved for several reasons as follows: #packet loss; #differences in application of flow timeouts due to slight difference in interpacket times; #different cache expiry events due to heavy loads; #uncertainties in timestamps due to propagation delay and clock synchronization issues; *Solutions #Mapping Packet Label to a Timestamp: the NetFlow is required to report a label of the first and last packet in addition to timestamp.In this way, the records can be associate not only through the timestamps but the labels. #Timing Checks to Eliminate Inconsistencies:without requiring changes to the current NetFlow architecture, this solution only utilizes timestamps to identify inconsistencies. Explanation of this solution: The sender exports a set of flow records i with fields （τsi,1，τsi,2，nsi，bsi）being the initial and final packet timestamps, number of packets, and bytes, respectively. The receiver-side flows have orresponding fields（τri,1，τri,2，nri，bri）note the same index on sender and receiver side is not assumed to riginate in the same set of packets. We now consider four different quantities e1+ e1- e2 e3 that represent timing uncertainty. *The propagation delay lies in the interval e1- *The packet processing latency is less than e2 *The clock uncertainty is less than e3 Let TinandTact denote the inactive and active flow timeouts,respectively.Criteria (C1)–(C5) applied to matching sender flow and receiver flow . (C1) τri,1 τsi,1 +[ e1-- e3 , e1++ e2+ e3] and τri,2 τsi,2 +[ e1-- e3 , e1++ e2+ e3] This condition says the first and last packet times of receiver flow are compatible with first and last packet times of sender flow , given the uncertainties of propagation, queuing, and clock synchronization. (C2) nsj= nri and bsj= bri Equality of bytes and packets is a necessary condition for the flows to be reporting on the same set of packets. (C3) τri,2-e1-+e3<τsj,1 +Tact .The upper bound for latest possible compatible sending time of the last flow packet seen at the receiver τri,2-e1-+e3 should fall within the active timeout since the first sender packet. (C4) τri,2-e1-+e3 < τsj,2 +Tin This condition is similar to (C3), except that it rules out sender packets being excluded from the flow due to inactive timeout since the last sender packet. (C5)If a pair of sender and receiver flows (of a particular key) has been discarded, discard all subsequent sender and receiver flows (of the same key) until the separation between successive flows exceedsTin on both sender and receiver sides. This is used because after discard of a pair of flows for violating any of conditions (C2)–(C4), the first times of subsequent flows may refer to different packets on the sender and receiver sides. The condition ensures discard of such “out of sync” flows. 'E. Foundations of Delay Correlation' Once pairs of flow records are associated, we have sender and receiver timestamps from both its first and last packets.We discuss the key foundation of our approach to estimate per-flow latency characteristics of the particular segment using these packet timestamps . In the simplest cases, the lag n autocorrelation of the packet queuing delay, rn ,obeys rn = 1-n(1-p)2 +O(1-p)3 (1) This formulashows that under heavy traffic.noticeable correlations persist at lag n for n up to 1/(1-p)2 In general,rn is a convex function and hence the correlations in practice decay more slowly than the dominant linear behavior in n of (1) and Internet traffic exhibits burstier arrivals than the simple models35. 'F. Latency Estimation' There are three ways that we can use to estimate the latency of the flow. 1. Endpoint Estimator *Estimate the latency of the flow by measuring the latency of the first and last packets of the flow d1 and d2. *Average the delays of the first and last packets of the flow, the formula is shown in figure 1. *This approach has limited accuracy in general, but if the flow is very small, it could be more accuracy. 2. Multiflow Estimator *Average the delay over all packets with sender timestamps between the sender timestamps of the flow. the formula is shown in figure 2. *In figure 2, the ΔMF is the set of packet delays associated with the flow and δ is a delay sample in the set ΔMF. 3. Hybrid Estimator *Hybrid Estimator method tries to blend the best of the Endpoint and Multiflow methods. *By doing Hybrid Estimator method, we must use the Endpoint method for smaller flows and the Multiflow method for larger flows. 'G. Variance and Its Estimation' 1.Problem *Given the expected correlations between delay measurement, how well is the variability of a single measurement predicted? *How does the variability of set of samples relate to that of the packetss in the flow under study itself? *The mean of te statistic of the delay we measured is not enough to well-describe the distribution of the delay. 2. Variance is required for describing the statistic of the delay *Variance can tell us more about the range of the change. *The unbiased estimate of the variance is shown in figure 3. In this formula, D is the sample mean and Δ is the set of samples. *For the Endpoint estimator, the quantity σ² is expected to be extremely noisy, being based on only two data points. 'H. Interpolation of Packet Delays' 1. Problem *If we want to use the endpoint packet delay to estimate the delay of arbitrary packet, the correlation between two packets should be strong enough. *It is hard to construct an setimator in this way since we don't know the sender time of arbitrary packet. 2. Linear interpolation between endpoint delays *Linear interpolation can construct a delay value for any time, we compare it to packet delays of arbitrary flows. *For any time between Τ1 and Τm, let i+® and i-® label the closest delay values in the future and past. The formula is shown in Figure 4. *After that, we can calculate the interpolated delay. The formula is shown in Figure 5. *We can see from the Figure 5 that the delay we predict is linear to the delay we already known. 'EVALUATION' 'A. Experimental Setup' 'B. Associating Flow Records' 'C. Estimator Accuracy' 'D. Sampling and Loss Rate Variation' 'E. Accuracy of Standard Deviation Estimates' 'CONCLUSION' 'REFERENCES' 32 N. Hohn and D. Veitch, “Inverting sampled traffic,” in Proc. ACM SIGCOMM IMC, Oct. 2003, pp. 222–233. 35 V. Paxson and S. Floyd, “Wide-area traffic: The failure of Poissonmodeling,” Comput. Commun. Rev., vol. 24, no. 4, pp. 257–268, 1994.